Search CORE

27 research outputs found

Attaching Translations to Proper Lexical Senses in DBnary

Author: Goulian Jérôme
Schwab Didier
Sérasset Gilles
Tchechmedjiev Andon
Publication venue: HAL CCSD
Publication date: 27/05/2014
Field of study

International audienceThe DBnary project aims at providing high quality Lexical Linked Data extracted from different Wiktionary language editions. Data from 10 different languages is currently extracted for a total of over 3.16M translation links that connect lexical entries from the 10 extracted languages, to entries in more than one thousand languages. In Wiktionary, glosses are often associated with translations to help users understand to what sense they refer to, whether through a textual definition or a target sense number. In this article we aim at the extraction of as much of this information as possible and then the disambiguation of the corresponding translations for all languages available. We use an adaptation of various textual and semantic similarity techniques based on partial or fuzzy gloss overlaps to disambiguate the translation relations (To account for the lack of normalization, e.g. lemmatization and PoS tagging) and then extract some of the sense number information present to build a gold standard so as to evaluate our disambiguation as well as tune and optimize the parameters of the similarity measures. We obtain F-measures of the order of 80\% (on par with similar work on English only), across the three languages where we could generate a gold standard (French, Portuguese, Finnish) and show that most of the disambiguation errors are due to inconsistencies in Wiktionary itself that cannot be detected at the generation of DBnary (shifted sense numbers, inconsistent glosses, etc.)

Hal - Université Grenoble Alpes

Constitution d'un corpus de dialogue oral pour l'évaluation automatique de la compréhension hors- et en- contexte du dialogue

Author: Antoine Jean-Yves
Bontron Olivier
Bousquet Caroline
Béchet Frédéric
Charnay Laurent
Choukri Khalid
Devillers Laurence
Goulian Jérôme
Maynard Hélène
Mctait Kevin
Mostefa Djamel
Paroubek Patrick
Romary Laurent
Rosset Sophie
Vergnes Myriam
Vigouroux Nadine
Publication venue: HAL CCSD
Publication date: 01/01/2004
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceThis paper presents and reports on the progress of the EVALDA/MEDIA project, focusing on the recording protocol of the reference dialogue corpus. The aim of this project is to define and test an evaluation methodology that assess and diagnose the contextsensitive understanding capability of spoken language dialogue systems. Systems from both academic organizations (CLIPS, IRIT, LIA, LIMSI, LORIA, VALORIA) and industrial sites (FRANCE TELECOM R et D, TELIP) will be evaluated. ELDA is the coordinator of the Technolangue/EVALDA multicampaign evaluation project, a national initiative sponsored by the French government, of which MEDIA is a sub-campaign. MEDIA began in January 2003. VECSYS provides the recording platform for the project

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

ACOLAD, Plateforme pour l'édition collaborative dépendancielle

Author: Brunet-Manquat Francis
Goulian Jérôme
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceThis paper presents an open-source platform for collaborative editing dependency corpora. ACOLAD platform (Annotation of corpus linguistics for the analysis of dependencies) offers manual annotation services such as segmentation and multi-level annotation (segmentation into words and phrases minimum (chunks), morphosyntactic annotation of words, syntactic annotation chunks and annotating syntactic dependencies between words or chunks). In this paper, we present ACOLAD platform, then we detail the representation used to manage concurrent annotations, then we describe the mechanism for importing external linguistic resources.Cet article présente une plateforme open-source pour l’édition collaborative de corpus de dépendances. Cette plateforme, nommée ACOLAD (Annotation de COrpus Linguistique pour l’Analyse de Dépendances), propose des services manuels de segmentation et d’annotation multi-niveaux (segmentation en mots et en syntagmes minimaux (chunks), annotation morphosyntaxique des mots, annotation syntaxique des chunks et annotation syntaxique des dépendances entre mots ou entre chunks). Dans cet article, nous présentons la plateforme ACOLAD, puis nous détaillons la représentation pivot utilisée pour gérer les annotations concurrentes, enfin décrivons le mécanisme d’importation de ressources linguistiques externes

Hal - Université Grenoble Alpes

ACOLAD, un environnement pour l'édition de corpus de dépendances

Author: Brunet-Manquat Francis
Goulian Jérôme
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceno abstrac

Hal - Université Grenoble Alpes

Fusion strategies applied to multilingual features for an knowledge-based Word Sense Disambiguation algorithm: evaluation and comparison

Author: Goulian Jérôme
Schwab Didier
Tchechmedjiev Andon
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceno abstrac

Hal - Université Grenoble Alpes

Désambigu\"ısation lexicale par propagation de mesures sémantiques locales par algorithmes à colonies de fourmis

Author: Goulian Jérôme
Guillaume Nathan
Schwab Didier
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceno abstrac

Hal - Université Grenoble Alpes

Performance of two French BERT models for French language on verbatim transcripts and online posts

Author: Goulian Jérôme
Kelodjoue Emmanuelle
Schwab Didier
Publication venue: HAL CCSD
Publication date: 16/12/2022
Field of study

International audiencePre-trained models based on the Transformer architecture have achieved notable performances in various language processing tasks. This article presents a comparison of two pretrained versions for French in a three-class classification task. The datasets used are of two types: a set of annotated verbatim transcripts from face-to-face interviews conducted during a market study and a set of online posts extracted from a community platform. Little work has been done in these two areas with transcribed oral corpora and online posts in French

Hal - Université Grenoble Alpes

CIFLI-SurviTra, deux facettes : démonstrateur de composants de TA fondée sur UNL, et phrasebook multilingue

Author: Fafiotte Georges
Falaise Achille
Goulian Jérôme
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceno abstrac

Hal - Université Grenoble Alpes

Désambigu\"ısation lexicale de textes : efficacité qualitative et temporelle d'un algorithme à colonies de fourmis

Author: Goulian Jérôme
Schwab Didier
Tchechmedjiev Andon
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/01/2013
Field of study

International audienceno abstrac

Hal - Université Grenoble Alpes